9 research outputs found

    A communication profiler to optimize embedded resource usage

    Get PDF
    While the number of cores in both embedded MultiProcessor Systems-on-Chip and general purpose processors keeps rising, on-chip communication becomes more and more important. In order to write efficient programs for these architectures, it is therefore necessary to have a good idea of the communication behavior of an application. We present a communication profiler that extracts this behavior from compiled sequential C/C++ programs, and constructs a dynamic dataflow graph at the level of major functional blocks. In contrast to existing methods of measuring inter-program communication, our tool automatically generates the program's dataflow graph and is less demanding for the developer. It can also be used to view differences between program phases (such as different video frames), which allows both input- and phase-specific optimizations to be made. We also look at how this information can subsequently be used to guide the effort of parallelizing the application, to co-design the software, memory hierarchy and communication hardware, and to provide new sources of communication-related runtime optimizations

    On-Sensor Data Filtering using Neuromorphic Computing for High Energy Physics Experiments

    Full text link
    This work describes the investigation of neuromorphic computing-based spiking neural network (SNN) models used to filter data from sensor electronics in high energy physics experiments conducted at the High Luminosity Large Hadron Collider. We present our approach for developing a compact neuromorphic model that filters out the sensor data based on the particle's transverse momentum with the goal of reducing the amount of data being sent to the downstream electronics. The incoming charge waveforms are converted to streams of binary-valued events, which are then processed by the SNN. We present our insights on the various system design choices - from data encoding to optimal hyperparameters of the training algorithm - for an accurate and compact SNN optimized for hardware deployment. Our results show that an SNN trained with an evolutionary algorithm and an optimized set of hyperparameters obtains a signal efficiency of about 91% with nearly half as many parameters as a deep neural network.Comment: Manuscript accepted at ICONS'2

    System Scenario Based Resource Management of Processing Elements on MPSoC (Systeemscenario-gebaseerd beheer van taken op multiprocessor systemen-op-chip (MPSoC))

    No full text
    Developing software for contemporary embedded systems, featuring het-erogeneous multiprocessors, multiple power modes, complex data memory hierarchies, and advanced interconnects, is a daunting task. Mapping of emerging, dynamic software applications on complex MPSoC (Multi Processor System On-Chip) will be achieved through the use of Middle-ware components which will be able to mediate between embedded application software and the hardware platforms. State-of-the-art tools that help to map software tasks to hardware resources are limited because they do not take into account the inter-dependencies among processing, memory, and communication constraints. They require the right granularity of the models at different abstractions to leverage between the complexity of middle-ware components and optimization of system cost (ex: energy consumption) while satisfying the real-time constraints of applications. Moreover, the applications are becoming very dynamic due to their inputs and environment, which needs to be handled by the efficient resource management of MPSoC resources.We have focused on all these aspects to reduce the final system cost (ex: energy consumption) while optimizing the middle-ware components which includes both the design-time and run-time phases. In the design-time phase, the dynamism in the application is reflected in the appropriate number of scenarios using the System Scenario based Methodology. We have also extended the Task concurrency Management (TCM) methodology design-time phase. There, for each identified system scenario, we have provided the methodology to efficiently explore the search space of all possible mappings and extract only the few Pareto-optimal mapping solutions. We propose middle-ware components for the run-time decisions; some are specific to applications (like the scenario detection logic to identify in which scenario the application is in) and others are generic middle-ware components (like run-time resource managers for MPSoC resource processing elements, memories, and interconnect). We have proposed various methodologies for the design-time phase activities to meet the time-to market demand and also run-time phase middle-ware components to leverage between the system cost savings and overheads in terms of performance and energy consumption. We have used the 3D-WSS based Scalable Graphics game engine (graphics) as the driver application to understand the issues in the heterogeneous multiprocessor resource management and application dynamism. We have demonstrated our work on the MP3 (multimedia) decoder, H264 (multimedia) decoder, Cavity Detector (medical imaging) applications, and artificial TGFF (Task Graphs For Free) test benches. We show our experimental gains on our high-level virtual platform MPSoC functional simulator, which is developed in SystemC. Our middle-ware components experimental results have shown that we can either obtain up to 400% gains on performance or up to 70% energyreduction on the trade-off axes for the demonstrated application, when compared to state-of-the art approaches.Abstract Contents 1 Introduction 2 Related Work 3 PE Resource Management and Methodology 4 Design-Time Co-Exploration 5 Overlapped Run-Time Resource Management 6 Systematic System Scenarios Identification 7 Systematic System Scenarios Detection Logic 8 Combined System Scenarios and TCM Methodology Exploration 9 Conclusion and Future Work Appendix-A Applications Appendix-B MPSoC Virtual Platform Simulator Appendix-C Scenario Detection Logic Code Bibliographynrpages: 278status: publishe

    PinComm: characterizing intra-application communication for the many-core era

    No full text
    While the number of cores in both embedded Multi-Processor Systems-on-Chip and general purpose processors keeps rising, on-chip communication becomes more and more important. In order to write efficient programs for these architectures, it is therefore necessary to have a good idea of the communication behavior of an application. We present a communication profiler that extracts this behavior from compiled, parallel or sequential C/C++ programs, and constructs a dynamic data-flow graph at the level of major functional blocks. In contrast to existing methods of measuring inter-program communication, our tool automatically generates the program's data-flow graph and is less demanding for the developer. It can also be used to view differences between program phases (such as different video frames), which allows both input- and phase-specific optimizations to be made. We will also describe briefly how this information can subsequently be used to guide the effort of parallelizing the application, to co-design the software, memory hierarchy and communication hardware, and to provide new sources of communication-related runtime optimizations

    SAMOSA: Scratchpad aware mapping of streaming applications

    No full text
    Scratchpad memories have now emerged as an alternative to caches for energy constrained embedded systems. However, effectively mapping data on them while considering energy/timing trade-offs remains a challenge. We present SAMOSA as a technique for mapping streaming applications to scratchpad based MPSoCs. The contribution of this approach is a representation and transformation of the mapping problems -- buffer dimensioning and allocation --? to a constraint-based optimization problem. SAMOSA was used to explore energy-execution time trade-offs for mapping the H.264 decoder to a scratchpad-based MPSoC. Results show that scratchpad awareness has significant impacts on the energy-execution time trade-offs.status: publishe

    A survey on processing-in-memory techniques: Advances and challenges

    No full text
    Processing-in-memory (PIM) techniques have gained much attention from computer architecture researchers, and significant research effort has been invested in exploring and developing such techniques. Increasing the research activity dedicated to improving PIM techniques will hopefully help deliver PIM’s promise to solve or significantly reduce memory access bottleneck problems for memory-intensive applications. We also believe it is imperative to track the advances made in PIM research to identify open challenges and enable the research community to make informed decisions and adjust future research directions. In this survey, we analyze recent studies that explored PIM techniques, summarize the advances made, compare recent PIM architectures, and identify target application domains and suitable memory technologies. We also discuss proposals that address unresolved issues of PIM designs (e.g., address translation/mapping of operands, workload analysis to identify application segments that can be accelerated with PIM, OS/runtime support, and coherency issues that must be resolved to incorporate PIM). We believe this work can serve as a useful reference for researchers exploring PIM techniques

    Memory and communication driven spatio-temporal scheduling on MPSoCs

    No full text
    Scheduling and executing software efficiently on contemporary embedded systems, featuring heterogeneous multi-processors, multiple power modes, complex memory hierarchies and advanced interconnects, is a daunting task. State-of-the-art tools that schedule software tasks to hardware resources face limitations: (1) either they do not take into account the interdependancies among processing, memory and communica- tion constraints (2) or they decouple the problem of spatial assignment from temporal scheduling. As a result existing tools make sub-optimal spatio-temporal scheduling decisions. This paper presents a technique to find globally optimized solutions by co-exploring spatio-temporal schedules for computation, data storage and communication simultaneously, considering the inter-dependencies between them. Experiments on mapping exploration of an image processing application on a heterogeneous MPSoC platform show that this co-exploration methodology finds schedules that are more energy efficient, when compared to decoupled exploration techniques for the particular application and target platform. ©2012 IEEE.status: publishe
    corecore